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Abstract. Composition of weighted transducers is a fundamental algorithm used 
in many applications, including for computing complex edit-distances between 
automata, or string kernels in machine learning, or to combine different compo- 
nents of a speech recognition, speech synthesis, or information extraction system. 
We present a generalization of the composition of weighted transducers, 3-way 
composition, which is dramatically faster in practice than the standard composi- 
tion algorithm when combining more than two transducers. The worst-case com- 
plexity of our algorithm for composing three transducers T\,T2, and Tg resulting 
inT, isO(|r|Qmin(d(Ti)d(T3),d(r2))-f ITIe), where |-|q denotes the num- 
ber of states, \ ■ \e the number of transitions, and d(-) the maximum out-degree. 
As in regular composition, the use of perfect hashing requires a pre-processing 
step with linear-time expected complexity in the size of the input transducers. In 
many cases, this approach significantly improves on the complexity of standard 
composition. Our algorithm also leads to a dramatically faster composition in 
practice. Furthermore, standard composition can be obtained as a special case of 
our algorithm. We report the results of several experiments demonstrating this im- 
provement. These theoretical and empirical improvements significantly enhance 
performance in the applications already mentioned. 



1 Introduction 

Weighted finite-state transducers are widely used in text, speech, and image process- 
ing applications and other related areas such as information extraction [8, 10, 12, 11, 4]. 
They are finite automata in which each transition is augmented with an output label 
and some weight, in addition to the familiar (input) label [14,5,7]. The weights may 
represent probabilities, log-likelihoods, or they may be some other costs used to rank 
alternatives. They are, more generally, elements of a semiring [7]. 

Weighted transducers are used to represent models derived from large data sets us- 
ing various statistical learning techniques such as pronunciation dictionaries, statistical 
grammars, string kernels, or complex edit-distance models [11,6,2,3]. These models 
can be combined to create complex systems such as a speech recognition or information 
extraction system using a fundamental transducer algorithm, composition of weighted 
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transducers [12, 11]. Weighted composition is a generalization of the composition al- 
gorithm for unweighted finite-state transducers which consists of matching the output 
label of the transitions of one transducer with the input label of the transitions of another 
transducer. The weighted case is however more complex and requires the introduction 
of an e-filter to avoid the creation of redundant e-paths and preserve the correct path 
multiplicity [12, 1 1]. The result is a new weighted transducer representing the relational 
composition of the two transducers. 

Composition is widely used in computational biology, text and speech, and ma- 
chine learning applications. In many of these applications, the transducers used are quite 
large, they may have as many as several hundred million states or transitions. A critical 
problem is thus to devise efficient algorithms for combining them. This paper presents 
a generalization of the composition of weighted transducer, 3-way composition, that is 
dramatically faster than the standard composition algorithm when combining more than 
two transducers. The complexity of composing three transducer Ti, T2, and T^, with the 
standard composition algorithm is 0(|Ti | IT2I IT3I) [12, 1 1]. Using perfect hashing, the 
worst-case complexity of computing T = (Ti 0T2) o T3 using standard composition is 

0(|T|Qmm(d(r3),d(Ti oTa)) + \T\e + \Ti o T2\q mm{d{Ti) , d{T2)) + \T10T2\E), (1) 

which may be prohibitive in some cases even when the resulting transducer T is not 
large but the intermediate transducer Ti o T2 is. Instead, the worst-case complexity of 
our algorithm is 

0(|T|Qmin(d(ri)d(T3),d(r2)) + |r|£;). (2) 

In both cases, the use of perfect hashing requires a pre-processing step with linear-time 
expected complexity in the size of the input transducers. 

Our algorithm also leads to a dramatically faster computation of the result of com- 
position in practice. We report the results of several experiments demonstrating this 
improvement. These theoretical and empirical improvements significantly enhance per- 
formance in a series of applications: string kernel-based algorithms in machine learn- 
ing, the computation of complex edit-distances between automata, speech recognition 
and speech synthesis, and information extraction. Furthermore, as we shall see later, 
standard composition can be obtained as a special case of 3-way composition. 

The main technical difficulty in the design of our algorithm is the definition of a 
filter to deal with a path multiplicity problem that arises in the presence of the empty 
string e in the composition of three transducers. This problem, which we shall describe 
in detail, leads to a word combinatorial problem [13]. We will present two solutions 
for this problem: one requiring two e-filters and a generalization of the e-filters used for 
standard composition [12,11]; and another direct and symmetric solution where a single 
filter is needed. Remarkably, this 3-way filter can be encoded as a finite automaton and 
painlessly integrated in our 3-way composition. 

The remainder of the paper is structured as follows. Some preliminary definitions 
and terminology are introduced in the next section (Section 2). Section 3 describes our 
3-way algorithm in the e-free case. The word combinatorial problem of e-path multi- 
plicity and our solutions are presented in detail Section 4. Section 5 reports the results 
of experiments using the 3-way algorithm and compares them with the standard com- 
position. 



2 Preliminaries 



This section gives the standard definition and specifies the notation used for weighted 
transducers. 

Finite-state transducers are finite automata in which each transition is augmented 
with an output label in addition to the familiar input label [1,5]. Output labels are 
concatenated along a path to form an output sequence and similarly with input labels. 
Weighted transducers are finite-state transducers in which each transition carries some 
weight in addition to the input and output labels [14, 7]. 

The weights are elements of a semiring, that is a ring that may lack negation [7]. 
Some familiar semirings are the tropical semiring (K.-|_ U {oo}, min, +, cxa, 0) related to 
classical shortest-paths algorithms, and the probability semiring (R, +, •, 0, 1). A semir- 
ing is idempotent if for all a G IK, a a = a. It is commutative when ® is commutative. 
We will assume in this paper that the semiring used is commutative, which is a neces- 
sary condition for composition to be an efficient algorithm [10]. 

The following gives a formal definition of weighted transducers. 

Definition 1. A weighted finite-state transducer T over (K, ©, •, 0, 1) is an 8-tuple T = 
{S, A, Q, I, F, E, A, p) where S is the finite input alphabet of the transducer, A is the 
finite output alphabet, Q is a finite set of states, I Q Q the set of initial states, F C Q 
the set of final states, E C Q x (S U {e}) x (AU {e}) x K x Q a finite set of transitions, 
A : / ^ K the initial weight function, and p : F ^ K. the final weight function mapping 
F to K. 

The weight of a path tt is obtained by multiplying the weights of its constituent transi- 
tions using the multiplication rule of the semiring and is denoted by w[-k]. The weight 
of a pair of input and output strings (x, y) is obtained by ©-summing the weights of the 
paths labeled with (.t, y) from an initial state to a final state. 

For a path tt, we denote by p[ti\ its origin state and by n[K\ its destination state. We 
also denote by x, y, F) the set of paths from the initial states / to the final states F 
labeled with input string x and output string y. A transducer T is regulated if the output 
weight associated by T to any pair of strings {x,y): 

T{x,y) = A(p[7r]) • w[tt] ■ p[n[TT]] (3) 

is well-defined and in K. T{x,y) ~ when P{I,x,y,F) = 0. If for all g G Q 
®7reP(9 e e g) ^[^] ^ ^^^'^ regulated. In particular, when T does not admit any 
e-cycle, it is regulated. The weighted transducers we will be considering in this paper 
will be regulated. Figure 1(a) shows an example. 

The composition of two weighted transducers Ti and T2 with matching input and 
output alphabets S, is a weighted transducer denoted by Ti o T2 when the sum: 

(TioT2)(x,2/) - Tiix,z)(^T2iz,y) (4) 

is well-defined and in IK for all x,y € S* [14, 7]. Weighted automata can be defined as 
weighted transducers A with identical input and output labels, for any transition. Thus, 
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Fig. 1. (a) Example of a weighted transducer T. (b) Example of a weighted automaton A. 
[r](aafo, bba) = [A](aa&) = .1 x .2 X .6 X .8 + .2 X .4 x .5 x .8. A bold circle indicates an 
initial state and a double-circle a final state. The final weight p[q] of a final state q is indicated 
after the slash symbol representing q. 

only pairs of the form {x, x) can have a non-zero weight by A, which is why the weight 
associated by A to {x, x) is abusively denoted by A{x) and identified with the weight 
associated by A to x. Similarly, in the graph representation of weighted automata, the 
output (or input) label is omitted. 

3 Epsilon-Free Composition 
3.1 Standard Composition 

Let us start with a brief description of the standard composition algorithm for weighted 
transducers [12, 11]. States in the composition Ti o T2 of two weighted transducers Ti 
and T2 are identified with pairs of a state of Ti and a state of T2. Leaving aside transi- 
tions with e inputs or outputs, the following rule specifies how to compute a transition 
of Ti o T2 from appropriate transitions of Ti and T2 '■ 

(91, a, 6,^1,92) and {q[,b,c,W2,q2) =^ {{qi,q[),a,c,Wi (E)W2, (92, '?2))- (5) 

Figure 2 illustrates the algorithm. In the worst case, all transitions of Ti leaving a 
state qi match all those of T2 leaving state q[, thus the space and time complexity 
of composition is quadratic: 0(|Ti||T2|). However, using perfect hashing on the in- 
put transducer with the highest out-degree leads to a worst-case complexity of 0(|ri o 
T2IQ min{d{Ti), d{T2)) + \TioT2\E)- The pre-processing step required for hashing the 
transitions of the transducer with the highest out-degree has an expected complexity in 
OUTiIe) ifd{Ti) > d{T2) andOdTali;) otherwise. 

The main problem with the standard composition algorithm is the following. As- 
sume that one wishes to compute Ti o T2 o T3, say for example by proceeding left to 
right. Thus, first Ti and T2 are composed to compute Ti o T2 and then the result is 
composed with T3. The worst-case complexity of that computation is: 

0{\Ti 0T20 TsIq min(d(Ti o T2), d{T3)) + \Ti 0T20 T3\e+ 

iTi or2|Qmin(d(Ti),d(T2)) + \T10T2\E). (6) 



(a) (b) (c) 

Fig. 2. Example of transducer composition, (a) Weighted transducer Ti and (b) Weigiited trans- 
ducer T2 over the probability semiring (R, +, •, 0, 1). (c) Result of the composition of Ti and 
T2. 

But, in many cases, computing Ti o T2 creates a very large number of transitions 
that may never match any transition of T^. For example, T2 may represent a com- 
plex edit-distance transducer, allowing all possible insertions, deletions, substitutions 
and perhaps other operations such as transpositions or more complex edits in Ti all 
with different costs. Even when Ti is a simple non-deterministic finite automaton with 
e-transitions, which is often the case in the applications already mentioned, Ti o T2 
will then have a very large number of paths, most of which will not match those of the 
non-deterministic automaton T3. In other applications in speech recognition, or for the 
computation of kernels in machine learning, the central transducer T2 could be far more 
complex and the set of transitions or paths of Ti o T2 not matching those of could be 
even larger 

3.2 3-Way Composition 

The key idea behind our algorithm is precisely to avoid creating these unnecessary tran- 
sitions by directly constructing Ti o T2 o T3, which we refer to as a 3-way composition. 
Thus, our algorithm does not include the intermediate step of creating T10T2 or T20T3. 
To do so, we can proceed following a lateral or sideways strategy: for each transition 
ei in Ti and 63 in T3, we search for matching transitions in T2. 

The pseudocode of the algorithm in the e-free case is given below. The algorithm 
computes T, the result of the composition Ti o T2 o T3. It uses a queue S contain- 
ing the set of pairs of states yet to be examined. The queue discipline of S can be 
arbitrarily chosen and does not affect the termination of the algorithm. Using a FIFO 
or LIFO discipline, the queue operations can be performed in constant time. We can 
pre-process the transducer T2 in expected linear time 0(|T2|£;) by using perfect hash- 
ing so that the transitions G (line 13) can be found in worst-case linear time 0{\G\). 
Thus, the worst-case running time complexity of the 3-way composition algorithm is in 
0{\T\Qd(Ti)d{T3) + \T\e), where T is transducer returned by the algorithm. 

Alternatively, depending on the size of the three transducers, it may be advantageous 
to direct the 3-way composition from the center, i.e., ask for each transition 62 in T2 if 
there are matching transitions ei in Ti and 63 in T3. We refer to this as the central strat- 
egy for our 3-way composition algorithm. Pre-processing the transducers Ti and T3 and 
creating hash tables for the transitions leaving each state (the expected complexity of 
this pre-processing being 0{\Ti\e + [T^Ie)), this strategy leads to a worst-case running 



time complexity of 0{\T\Qd{T2) + \T\e)- The lateral and central strategies can be com- 
bined by using, at a state {qi, q2, (js), the lateral strategy if \E[qi] \ ■ \E[q3] \ < \E[q2] and 
the central strategy otherwise. The algorithm leads to a natural lazy or on-demand im- 
plementation in which the transitions of the resulting transducer T are generated only 
as needed by other operations on T. The standard composition coincides with the 3- 
way algorithm when using the central strategy with either Ti or T2 equal to the identity 
transducer 

3-WAY-coMPosmoN(ri,r2,r3) 

1 Q ^ h X h X I -i 

2 S ^hxhxh 

3 while S / do 



4 (gi,g2,q3) ^ HEAD(5) 

5 Dequeue(5') 

6 if (gi , (?2 , gs) G /i X /2 X /a tlien 

7 /^/U{(qi,g2,g3)} 

8 A(gi, 52, gs) *- Ai(gi) ® A2(g2) ® )^3{qa) 

9 if (gi , 52 , ga) G i^i X F2 X F3 then 

10 F ^ FU{{qi,q2,qa)} 

11 P('?i, '72,53) ^Pi(5i)®P2(52) (8/93(53) 

12 for each (61,63) e £[51] x £[53] do 

13 G ^ {6 G £[52] : i[e] = o[ei] A o[e] = i[63]} 

14 for each 62 G G do 

15 if (n[6i], n[62], n[63]) ^ Q then 

16 Q ^ QU {(n[6i],n[62],n[63])} 

17 ENQUEUE(5', (n[6l], 7i[62], n[63])) 

18 E ^ EU {((51, 92, q3),i[ei],o[e3],w[ei] (g) ^[62] (8> ^[63], (n[6i], ?i[62], n[63] 



19 return T 

4 Epsilon filtering 

The algorithm described thus far cannot be readily used in most cases found in practice. 
In general, a transducer Ti may have transitions with output label e and T2 transitions 
with input e. A straightforward generalization of the e-free case would generate redun- 
dant e-paths and, in the case of non-idempotent semirings, would lead to an incorrect 
result, even just for composing two transducers. The weight of two matching e-paths of 
the original transducers would be counted as many times as the number of redundant e- 
paths generated in the result, instead of one. Thus, a crucial component of our algorithm 
consists of coping with this problem. 

Figure 3(a) illustrates the problem just mentioned in the simpler case of two trans- 
ducers. To match e-paths leaving qi and those leaving q2, a generalization of the e-free 
composition can make the following moves: (1) first move forward on a transition of 
qi with output e, or even a path with output e, and stay at the same state q2 in T2, with 
the hope of later finding a transition whose output label is some label a ^ e matching 
a transition of q2 with the same input label; (2) proceed similarly by following a transi- 
tion or path leaving q2 with input label e while staying at the same state qi in Ti; or, (3) 
match a transition of qi with output label e with a transition of 52 with input label e. 




Fig. 3. (a) Redundant e-paths. A straightforward generalization of the e-free case could generate 
all the paths from (0, 0) to (2, 2) for example, even when composing just two simple transducers, 
(b) Filter transducer M allowing a unique e-path. 

Let us rename existing output e-labels of Ti as 62, and existing input e-labels of T2 
ei, and let us augment Ti with a self-loop labeled with ei at all states and similarly, 
augment T2 with a self-loop labeled with £2 at all states, as illustrated by Figures 5(a) 
and (c). These self-loops correspond to staying at the same state in that machine while 
consuming an e-label of the other transition. The three moves just described now cor- 
respond to the matches (1) (£2:62), (2) (£i:£i), and (3) (£2:£i). The grid of Figure 3(a) 
shows all the possible £-paths between composition states. We will denote by Ti and 
T2 the transducers obtained after application of these changes. 

For the result of composition to be correct, between any two of these states, all 
but one path must be disallowed. There are many possible ways of selecting that path. 
One natural way is to select the shortest path with the diagonal transitions (£-matching 
transitions) taken first. Figure 3(a) illustrates in boldface the path just described from 
state (0, 0) to state (1,2). Remarkably, this filtering mechanism itself can be encoded 
as a finite-state transducer such as the transducer M of Figure 3(b). We denote by 
{Pi q) d: {r, s) to indicate that (r, s) can be reached from {p, q) in the grid. 

Proposition 1. Let M be the transducer of Figure 3(b). M allows a unique path be- 
tween any two states (p, q) and (r, s), with (p, q) ^ (r, s). 

Proof. Let a denote (£i:£i), b denote (£2:£2), c denote (£2:£i), and let x stand for any 
{x:x), with X G E. The following sequences must be disallowed by a shortest-path filter 
with matching transitions first: ab, ba, ac, be. This is because, from any state, instead of 
the moves ab or ba, the matching or diagonal transition c can be taken. Similarly, instead 
of ac or be, ca and cb can be taken for an earlier match. Conversely, it is clear from the 
grid or an immediate recursion that a filter disallowing these sequences accepts a unique 
path between two connected states of the grid. 

Let L be the set of sequences over a ~ {a,b,e, x} that contain one of the disallowed 
sequence just mentioned as a substring that is L = a*{ab + ba + ae + be)a* . Then L 
represents exactly the set of paths allowed by that filter and is thus a regular language. 
Let A be an automaton representing L (Figure 4(a)). An automaton representing L can 
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Fig. 4. (a) Finite automaton A representing the set of disallowed sequences, (b) Automaton B, 
result of the determinization of A. Subsets are indicated at each state, (c) Automaton C obtained 
from B by complementation, state 3 is not coaccessible. 

be constructed from A by determinization and complementation (Figures 4(a)-(c)). The 
resulting automaton C is equivalent to the transducer M after removal of the state 3, 
which does not admit a path to a final state. □ 

Thus, to compose two transducers Ti and T2 with e-transitions, it suffices to compute 
Ti o M o T2, using the rules of composition in the e-free case. 

The problem of avoiding the creation of redundant e-paths is more complex in 3-way 
composition since the e-transitions of all three transducers must be taken into account. 
We describe two solutions for this problem, one based on two filters, another based on 
a single filter 

4.1 2 -way e-Filters. 

One way to deal with this problem is to use the 2-way filter A/, by first dealing with 
matching e-paths inU ^ (Ti o T2), and then U o T3. However, in 3-way composition, 
it is possible to remain at the same state of Ti and the same state of T2, and move on 
an e-transition of T^, which previously was not an option. This corresponds to staying 
at the same state of U, while moving on a transition of with input e. To account for 
this move, we introduce a new symbol eo matching ei in Ta. But, we must also ensure 
the existence of a self-loop with output label eo at all states of U. To do so, we augment 
the filter M with self-loops (ei:eo) and the transducer T2 with self-loops (eoiei) (see 
Figure 5(b)). Figure 5(d) shows the resulting filter transducer Mi. From Figures 5(a)- 
(c), it is clear that Ti o Mi o T2 will have precisely a self-loop labeled with (ei:ei) at 
all states. 

In the same way, we must allow for moving forward on a transition of Ti with output 
e, that is consuming €2, while remaining at the same states of T2 and Tij. To do so, we 
introduce again a new symbol eo this time only relevant for matching T2 with T3, add 
self-loops (e2:eo) to T2, and augment the filter M by adding a transition labeled with 
(£0:62) (resp. (eo:ei)) wherever there used to be one labeled with {e2'-^2) (resp. (e2:ei)). 
Figure 5(e) shows the resulting filter transducer M2. 

Thus, the composition Ti o Mi o T2 o M2 o T3 ensures the uniqueness of matching 
e-paths. In practice, the modifications of the transducers Ti, T2, and T3 to generate Ti, 
T2, and T3, as well as the filters Mi and A/2 can be directly simulated or encoded in the 
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Fig. 5. Marking of transducers and 2-way filters, (a) Ti. Self-loop labeled with ei added at all 
states of Ti, regular output es renamed to 62. (b) T2. Self-loops with labels (eo'.ei) and (e2:eo) 
added at all states of T2. Input es are replaced by ei, output es by e2. (c) T3. Self-loop labeled 
with e2 added at all states of T3, regular input es renamed to ei. (d) Left-to-right filter Mi. (e) 
Left-to-right filter M2. 



3-way composition algorithm for greater efficiency. The states in T become quintuples 
(gi, q2, qs, fi, 12) with /i and /2 are states of the filters Mi and M2. The introduction 
of self-loops and marking of es can be simulated (line 12-13) and the filter states /i and 
/2 taken into account to compute the set G of the transition matches allowed (line 13). 

Note that while 3-way composition is symmetric, the analysis of e-paths just pre- 
sented is left-to-right and the filters Mi and M2 are not symmetric. In fact, we could 
similarly define right-to-left filters M[ and Mj. The advantage of the filters presented 
in this section is however that they can help modify easily an existing implementation 
of composition into 3-way composition. The filters needed for the 3-way case are also 
straightforward generalizations of the e-filter used in standard composition. 

4.2 3-way e-Filter. 

There exists however a direct and symmetric method for dealing with e-paths in 3-way 
composition. Remarkably, this can be done using a single filter automaton whose labels 
are 3-dimensional vectors. Figure 6 shows a filter W that can be used for that purpose. 
Each transition is labeled with a triplet. The ith element of the triplet corresponding to 
the move on the ith transducer. indicates staying at the same state or not moving, 1 
that a move is made reading an e-transition, and x a move along a matching transition 
with a non-empty symbol (i.e., non-e output in Ti, non-e input or output in T2 and non-e 
input in T3). 

Matching e-paths now correspond to a three-dimensional grid, which leads to a 
more complex word combinatorics problem. As in the two-dimensional case, {p, q,r) ^ 
{s, t, u) indicates that (s, t, u) can be reached from {p, q, r) in the grid. Several filters 
are possible, here we will again favor the matching of e-transitions (i.e. the diagonals 
on the grid). 

Proposition 2. The filter automaton W allows a unique path between any two states 
{p, q, r) and (s, i, u) of a three-dimensional grid, with ij), q, r) ^ (s, t, u). 



Fig. 6. 3- way matching e-filter W. 



Proof. Let Tl and X be the defined by 9Jt = {(mi, TO2, m^) : mi, m2, ma G {0, 1}} 
and X = {{x,x,m),{m,x,x) : m £ {0, 1}}. A sequence of moves corresponding 
to a matching e-path is thus an element of (2Jl U X)*. Two sequences tti and 7r2 are 
equivalent if they consume the same sequence of transitions on each of the three trans- 
ducers, for example (0, x, x){l, 1, 0) is equivalent to (1, x, x){0, 1, 0). For each set of 
equivalent move sequences between two states (p, q, r) and (s, t, u), we must preserve 
a unique sequence representative of that set. We now define the unique corresponding 
representative tt of each sequence tt G (9Jl U X)*. In all cases, W will be the sequence 
where the 1-moves and the .T-moves are taken as early as possible. 

1 . Assume that tt G 3Jl* and let be the number of occurrences of 1 as the ith element 
in a triplets defining tt. By symmetry, we can assume without loss of generality that 
ni <n2 < ?i3. We define 7f as (1, 1, (0, 1, (0, 0, that is the 
sequence where the 1-moves are taken as early as possible. 

2. Otherwise, tt can be decomposed as tt = /U1X1M2X2 • • • /^fcXfcMfc+i with fc > 1, 
fii G 971* and Xi ^ ^-t^ is then defined by induction on k. By symmetry, we can 
assume that xi = {x, x, m) with m G {0, 1}. Let tt' be such that tt = hiXitt', let 
rii be the number of times 1 appears as ith element in a triplet of ^1, and let 713 the 
number of times 1 is found as third element in a triplet reading Xitt' from left to 
right before seeing an x. 

(a) If 713 < max(ni, 712), let n = min(n3, max(ni, 712) — ria). We can then obtain 
x'iTt" by replacing the n first I's that appears in xi""' third element of a 
triplet by O's. Let /i'l = /ii (0,0,1)". We then have that tt is equivalent to 
/iiX'iTr". By induction, we can compute tt" and and define W as /i'lx'iTr". 

(b) If ria > max(ni, 712), we define n as 71,3 — max(ri,i, 712) if Xi = (2^1 2;, 1) and 
713 - max(?^i,n2) - 1 if xi = (x, a;, 0). Let /^i be (1, 1, l)"i (0, 1, 1)"^-"! 
if ni < 7(2 and (1, 1, 1)"2(1,0, l)"i~"2 otherwise. We can then define tt as 
K(x,x,l)(0,0,l)'V. 



A key property of tt is that it can be characterized by a small set of forbidden sequences. 
Indeed, observe that the following rules apply: 

1. in two consecutive triplets, for i G [1, 3], in the ith machine of the first triplet 
cannot be followed by 1 in the second. Indeed, as in the 2-way case, if we stay at a 
state, then we must remain at that state until a match with a non-empty symbol is 
made (this correspond to cases 1 and 2(a) of the definition of 7f). 

2. two Os in adjacent transducers (Ti and T2, or T2 and T^), cannot become both xs 
unless all components become .ts; For example, the sequence (0, 0, l){x, x, 1) is 
disallowed since instead {x, x, 1)(0, 0, 1) with an earlier match can be followed. 
Similarly, the sequence (0, 0, l)(a;, x, 0) is disallowed since instead the single and 
shorter move {x, x, 1) can be taken (this correspond to case 2(b) of the definition). 

3. the triplet (0, 0, 0) is always forbidden since it corresponds to remaining at the same 
state in all three transducers. 

Conversely, we observe that with our definition of tt , these conditions are also sufficient. 
Thus, a filter can be obtained by taking the complement of an automaton accepting 
exactly the sequences of forbidden substrings just described. The resulting deterministic 
and minimal automaton is the filter W shown in Figure 6. Observe that each state of 
W has a transition labeled by {x, x, x) going to the initial state 0, this corresponds to 
resetting the filter at the end of a matching e-path. □ 

The filter W is used as follows. A triplet state (qi, q2, qs) in 3-way composition is 
augmented with a state r of the filter automaton W, starting with state of W. The 
transitions of the filter W at each state r determine the matches or moves allowed for 
that state {qi, (72, 93, r) of the composed machine. 

5 Experiments 

This section reports the results of experiments carried out in two different applications: 
the computation of a complex edit-distance between two automata, as motivated by 
applications in text and speech processing [9], and the computation of kernels between 
automata needed in spoken-dialog classification and other machine learning tasks. 

Table 1. Comparison of 3-way composition with standard composition. The computation times 
are reported in seconds, the size of T2 in number of transitions. These experiments were per- 
formed on a dual-core AMD Opteron 2.2GHz with 16GB of memory, using the same software 
library and basic infrastructure. 
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8.2 8.2 8.2 


8.2 


3.8 5.9 



Size of T2 70K lOOK 130K 160K 190K 220K 25M 75M 

In the edit-distance case, the standard transducer T2 used was one based on all inser- 
tions, deletions, and substitutions with different costs [9]. A more realistic transducer 



T2 was one augmented with all transpositions, e.g., ab ba, with different costs. In 
the kernel case, n-gram kernels with varying n-gram order were used [3]. 

Table 5 shows the results of these experiments. The finite automata Ti and T3 used 
were extracted from real text and speech processing tasks. The results show that in all 
cases, 3-way composition is orders of magnitude faster than standard composition. 

6 Conclusion 

We presented a general algorithm for the composition of weighted finite-state trans- 
ducers. In many instances, 3-way composition benefits from a significantly better time 
and space complexity. Our experiments with both complex edit-distance computations 
arising in a number of applications in text and speech processing, and with kernel com- 
putations, crucial to many machine learning algorithms applied to sequence prediction, 
show that our algorithm is also substantially faster than standard composition in prac- 
tice. We expect 3-way composition to further improve efficiency in a variety of other 
areas and applications in which weighted composition of transducers is used. 
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